Salience as a Simplifying Metaphor for Natural Language Generation
نویسندگان
چکیده
We have developed a simple yet effective technique for planning the generation of natural language texts that describe photographs of natural scenes as processed by the UMass VISIONS system. The texts follow the ordering on the scene’ s objects that is imposed by their visual salience -an ordering which we believe is naturally computed as a by-product of visual processing, and thus is available -for free -as the basis for generating simple but effective texts without requiring the complex planning machinery often applied in generation. We suggest that it should be possible to find structural analogs to visual salience in other domains and to build comparably simple generation schemes based on them. We look briefly at how one such analogy might be drawn for the task of tutoring novice PASCAL programmers. I Natural language generation and the superhuman-human fallacy Taken in its general form, the problem of deciding what to say is a planning problem of great complexity. When speaking carefully and deliberately a per son will attempt to simultaneously satisfy many goals from different sources: rhetorical, tutorial, affective, and descriptive, among others. Utterances are intended to obey strict constraints deriving from the limited expressive power of the syntax and vocabulary of natural language and from the requirement to maintain the linguistic coherency of the discourse context established by what has been said up to that point. In addition utterances must do all this while being reasonably short in length and precise in style if the audience is not to become bored or confused. It is no wonder, then, that the ability to speak or write well does not come easily. Even though we all use language constantly, relatively few of us have the skill of a Mark Twain or a Winston Churchill. The requirements of everyday communication do not appear to require optimum linguistic performance. 1. This report describes work done in the Department of Computer and Information Science at the University of Massachusetts. It was supported in part by National Science Foundation grant IST 8104984 (Michael Arbib and David McDonald, Co-Principal Investigators). In this light, we must consider whether we have been making the generation probelm for computers more difficult than it actually is for people -the superhuman-human fallacy. Should we require our computers to speak any more effectively than we do ourselves? Most of us, as we speak, notice when we have left something out or inadvertently given the wrong emphasis, and we correct our mistakes by interrupting or modifying what we were about to say next; in explanations we use feedback from our audience such as questions or puzzled looks to dynamically adjust our vocabulary and level of detail. We should seriously consider designing our natural language generation systems on a similar basis: adopting an expedient and computationally efficient, if I’ leaky” , planning process and compensating for it by monitoring and attending to user questions. At the University of Massachusetts we have developed just such an expedient planning system, which we use in conjunction with a highly efficient (i.e. quasi-realtime) text generator. Taking as input a simulation of the output of a computer vision system, the planner determines the order in which objects will be mentioned in the text and what will be said about them, feeding this information via a pipeline to the generator where grammatical constraints determine the exact phrasing and local rules (such as pronominalization and ellipsis) are applied to maintain the coherency of the discourse. The key to the planner’s simplicity is its reliance on the notion of “salience” -objects are introduced into the text according to their relative importance in the conceptual source of the text. The decision as to what objects, properties, and relations to leave out -a source of considerable labor in some generation systems (e.g. Mann and Moore [61, McKeown [51> -is handled trivially here by defining a cut-off salience rating below which objects are ignored. The task for which we d ev el oped this facility, the production of short paragraphs describing photographs of houses, is deliberately one in which the common sense notion of visual salience is vivid and widely shared by members of this culture. People interpret what is important about a picture -what it is a picture “oftr -according to a shared set of conventions involving the size and centrality of the objects shown, coupled with a sense of what is normal or expected : a large stained-glass window on an otherwise ordinary New England farm house would be highly salient; 75 From: AAAI-82 Proceedings. Copyright ©1982, AAAI (www.aaai.org). All rights reserved. similarly a normally unimportant part of the scene, such as the mailbox’ can be artificially raised in salience if framed prominently in the foreground of the picture. II Our Generation System As of this writing, the salience-based planner (the subject of Conklin’s PhD. thesis) has been implemented and its pipeline to the text generator (McDonald’s system MUMBLE [91> hand simulated. The house scenes which are the source of the text are very similar to those used in the research of the UMass “VISIONS” system [ 101 (see Figure 1); their representation is also presently hand-simulated: the planner works from a KL-ONE data base of the objects in the scene and the spatial relations between them which was designed in close collaboration with members of the VISIONS project, and which reflects the actual kinds of informatio.) they expect to extract from a visual scene. The salience ratings with which the objects in the visual representation are annotated were derived empirically through extensive psychological testing of human subjects C31, where the subjects both rated the objects in each of a series of Fig. 1. One of the pictures used in the studies and an example of the kind of descriptive paragraph that subjects wrote about it. “This is a picture of a white house with a fence in front of it. The house has a red door and the fence has a red gate. There is a driveway beside the house, and a tree next to the driveway. In the foreground is a mailbox. It is a cloudy day in winter .” ____-~--.____----.__--_-..----. .---.. _ _.______ __ __.___ _ pictures on a zero to seven scale, and wrote short paragraphs describing the scenes. The objects’ ratings were quite consistent across subjects and sessions of the experiment. The paragraphs provide an objective base-line for the kind of style and overall organization that should be generated by the system. Given the salience data, the planning algorithm runs as follows (see also [41): the objects in the scene are placed in a list -the “Unused Salient Object List” -in decreasing order from most to least salient. The properties of the objects (such as color, size, or style) and their relative spatial relations can be accessed from the general scene data base when desired; one can, for ex amp1 e , ask for the most salient relationship in which a particular object is involved (by definition relations acquire their salience from the objects they relate). Objects are taken from the “Unused Salient Object List” (shortening the list in the process), packaged with se1 ec ted properties and relations, and sent to the generator by the action of a collection of strictly local rhetorical rules. The rules are couched as productions, have relative priorities, and are organized into packets according to when they apply -essentially the same architecture as Marcus used in his natural 1 anguage parser [71. This architecture allows us to incorporate object-specific rules (such as that one always sees houses introduced with one of their properties: “a white house” or “a New England farm house”, and never simply as “a house” > and al so simple stylistic rules, such as maintaining sentences of an appropriate length. The process proceeds by successively taking the first object on the list (i.e. the most salient unmentioned object), making it the local “current item” ’ and proceeding to describe the most salient properties and relations, finally ‘1popping’1 the list of unmentioned objects and moving on to describe the next most salient object. The scene descriptions produced by this process will never win a prize for good literature. They are, however, apparently effective as descriptions: as judged by (so far only a few) informal trials, paragraphs generated automatically on the basis of the salience ratings derived from the experiments are effective in picking out which picture they correspond to from others of the same material but taken from a different camera angle. Fur thermore they provide a base line for potentially measuring the If value-added” of a full-scale global planning system that wculd be capable of reasoning about and directing the larger-scale rhetorical organization of the text (say, one on the model of Appelt C 11, or McKeown [51). III Where does salience come from? --We claim that the annotation of an object’s visual salience can be provided as a natural part of the perception process. For example, one aspect of salience stems from unexpectedness: items which are not predicted by, or are inconsistent with, high-level world know1 edge are unusual and therefore salient. Also’ an item’s size and centrality in the picture are clearly factors in that item’s salience. Specifically, the record of an object’s relative salience would arise from the perceptual process’s explicitly combining: 1) the weighting the object contributed to the judgement that the scene was what is was, 2) the object’s
منابع مشابه
Kees Van Deemter and Emiel Krahmer Graphs and Booleans: on the Generation of Referring Expressions
Generation of Referring Expressions (gre) is a key task of Natural Language Generation nlg systems (e.g., Reiter and Dale, 2000, section 5.4). The task of a gre algorithm is to find combinations of properties that allow the generator to refer uniquely to an object or set of objects, called the target of the algorithm. Older gre algorithms tend to be based on a number of strongly simplifying ass...
متن کاملCancer as a Metaphor
Introduction: Cognitive linguists believe that metaphor is a cognitive phenomenon, and that what appears in language is just the aspect of cognitive phenomenon Metaphor is one of the basic interdisciplinary concepts that has been paid more attention to its fundamental role in various aspects of psychology, medicine, including cancer in recent years. Methods:. From the point of view of cognitiv...
متن کاملFrom Embodiment to Metaphor: A Study on Social Cognitive Development and Conceptual Metaphor in Persian-Speaking Children
This study explores the metaphoric comprehension of normal Persian-speaking children, as well as theories of cognitive development and cultural and social impacts. The researchers discuss the improvement of the understanding of ontological conceptual metaphors through age growth and cognitive development, and how it helps to expand children’s thoughts and knowledge of the world. In this study, ...
متن کاملMetaphorical Conceptualization of SPORT Through TERRITORY as a Vehicle
WAR as a vehicle and Sport Is War as a conceptual metaphor (CM) seem inadequate to account metaphorically for SPORT. To cater for an inclusive vehicle/CM, we selected WIN and LOSS lexicon from the news coverage of Brazil’s football team loss to Germany and tested them through the Corpus of Contemporary American English. Then, the data were studied through the 3 stages of metaphor research. In t...
متن کاملThe Role of Saliency in Generating Natural Language Arguments
Generating expressions which communicate information already known to the hearer, building enthymematic arguments, and characterising refutations all pose significant problems to traditional natural language generation techniques. After exploring these problems, an approach is proposed which through its employment of a notion of saliency handles them cleanly, and offers support for further feat...
متن کامل